- Skill Validator
- Validate implementations match manifests using Codex for semantic comparison
- Purpose
- Ensures that skill/MCP implementations actually deliver what their manifests promise. Uses Codex to perform semantic analysis comparing descriptions, preconditions, and effects against actual code. Detects drift, missing functionality, and over-promised capabilities.
- When to Use
- After updating skill implementations
- During quality audits to verify accuracy
- When manifests feel outdated or incorrect
- To detect description-implementation drift
- Before releasing new versions of resources
- Key Capabilities
- Semantic Comparison
-
- Uses Codex to understand if code matches description
- Precondition Validation
-
- Verifies claimed preconditions are actually checked
- Effect Verification
-
- Confirms code produces claimed effects
- API Surface Analysis
-
- Validates exposed functions match manifest
- Drift Detection
-
- Identifies when implementation diverges from manifest
- Coverage Scoring
- Measures how much of manifest is implemented Inputs inputs : resource_path : string
Path to skill/MCP directory
manifest_path : string
Path to manifest.yaml (default: resource_path/manifest.yaml)
implementation_path : string
Path to code (default: resource_path/index.js)
strict_mode : boolean
Fail on warnings (default: false)
Process Step 1: Load Manifest and Implementation
!/bin/bash
Load manifest and implementation
RESOURCE_PATH
" $1 " MANIFEST_PATH = " ${2 :- $RESOURCE_PATH / manifest.yaml} " IMPL_PATH = " ${3 :- $RESOURCE_PATH / index.js} " if [ ! -f " $MANIFEST_PATH " ] ; then echo "❌ Manifest not found: $MANIFEST_PATH " exit 1 fi if [ ! -f " $IMPL_PATH " ] ; then
Try alternative extensions
if [ -f " $RESOURCE_PATH /index.ts" ] ; then IMPL_PATH = " $RESOURCE_PATH /index.ts" elif [ -f " $RESOURCE_PATH /SKILL.md" ] ; then
Skill might be declarative only
IMPL_PATH
"" else echo "⚠️ No implementation file found, validating description only" IMPL_PATH = "" fi fi
Read manifest
MANIFEST
$( cat " $MANIFEST_PATH " )
Read implementation (if exists)
if [ -n " $IMPL_PATH " ] ; then IMPLEMENTATION = $( cat " $IMPL_PATH " ) else IMPLEMENTATION = "" fi Step 2: Validate Description Accuracy
Use Codex to compare description with implementation
- codex
- exec
- "
- Compare this manifest description with the actual implementation:
- MANIFEST:
- $MANIFEST
- IMPLEMENTATION:
- $IMPLEMENTATION
- Questions:
- 1. Does the implementation match the description?
- 2. Are there features described but not implemented?
- 3. Are there features implemented but not described?
- 4. Is the description accurate and complete?
- Output JSON:
- {
- \"
- description_accurate
- \"
-
- boolean,
- \"
- missing_features
- \"
-
- [
- \"
- feature1
- \"
- ,
- \"
- feature2
- \"
- ],
- \"
- undocumented_features
- \"
-
- [
- \"
- feature3
- \"
- ],
- \"
- accuracy_score
- \"
-
- 0.0-1.0,
- \"
- issues
- \"
- [
{
\"
type
\"
:
\"
missing_feature
\"
,
\"
severity
\"
:
\"
high|medium|low
\"
,
\"
description
\"
:
\"
...
\"
,
\"
suggestion
\"
:
\"
...
\"
}
]
}
"
/tmp/validation-description.json Step 3: Validate Preconditions
Check if preconditions are actually enforced in code
PRECONDITIONS
$( python3 -c " import yaml, json manifest = yaml.safe_load ( open ( '$MANIFEST_PATH' ) ) print ( json.dumps ( manifest.get ( 'preconditions' , [ ] ) , indent = 2 )) ") codex exec " Analyze if these preconditions are actually checked in the code: PRECONDITIONS: $PRECONDITIONS IMPLEMENTATION: $IMPLEMENTATION For each precondition, determine: 1 . Is it checked in the code? 2 . Where is it checked ( function name, line number ) ? 3 . Does it fail gracefully if not met? 4 . Is the error message clear? Output JSON: { \ "preconditions_validated \ ": [ { \ "check \ ": \ "file_exists ( 'package.json' ) \ ", \ "enforced \ ": boolean, \ "location \ ": \ "function:line \ ", \ "error_handling \ ": \ "good | poor | missing \ ", \ "suggestion \ ": \ " .. . \ " } ] , \ "coverage_score \ ": 0.0 -1.0 } "
/tmp/validation-preconditions.json Step 4: Validate Effects
Check if claimed effects are actually produced
EFFECTS
$( python3 -c " import yaml, json manifest = yaml.safe_load ( open ( '$MANIFEST_PATH' ) ) print ( json.dumps ( manifest.get ( 'effects' , [ ] ) , indent = 2 )) ") codex exec " Verify that this code actually produces the claimed effects: CLAIMED EFFECTS: $EFFECTS IMPLEMENTATION: $IMPLEMENTATION For each effect, determine: 1 . Is the effect actually produced? 2 . Where in the code does it happen? 3 . Are there conditions where it might not happen? 4 . Are there other effects not listed? Output JSON: { \ "effects_validated \ ": [ { \ "effect \ ": \ "creates_vector_index \ ", \ "implemented \ ": boolean, \ "location \ ": \ "function:line \ ", \ "conditional \ ": boolean, \ "confidence \ ": 0.0 -1.0 } ] , \ "missing_effects \ ": [ \ "effect1 \ " ] , \ "extra_effects \ ": [ \ "effect2 \ " ] , \ "coverage_score \ ": 0.0 -1.0 } "
/tmp/validation-effects.json Step 5: Validate API Surface
For MCPs/tools with defined APIs, validate exports
- if
- [
- -n
- "
- $IMPLEMENTATION
- "
- ]
- ;
- then
- codex
- exec
- "
- Analyze the API surface of this implementation:
- IMPLEMENTATION:
- $IMPLEMENTATION
- Questions:
- 1. What functions/classes are exported?
- 2. What are their signatures?
- 3. Are they documented?
- 4. Do they match what the manifest describes?
- Output JSON:
- {
- \"
- exports
- \"
-
- [
- {
- \"
- name
- \"
- :
- \"
- functionName
- \"
- ,
- \"
- type
- \"
- :
- \"
- function|class|object
- \"
- ,
- \"
- signature
- \"
- :
- \"
- (args) => result
- \"
- ,
- \"
- documented
- \"
-
- boolean
- }
- ],
- \"
- api_complete
- \"
- boolean,
\"
documentation_quality
\"
:
\"
good|fair|poor
\"
}
"
/tmp/validation-api.json fi Step 6: Generate Validation Report
Combine all validation results
python3 << 'PYTHON_SCRIPT' import json from datetime import datetime
Load validation results
with open('/tmp/validation-description.json') as f: desc_validation = json.load(f) with open('/tmp/validation-preconditions.json') as f: precond_validation = json.load(f) with open('/tmp/validation-effects.json') as f: effects_validation = json.load(f) try: with open('/tmp/validation-api.json') as f: api_validation = json.load(f) except FileNotFoundError: api_validation = None
Calculate overall score
scores = [ desc_validation.get('accuracy_score', 0), precond_validation.get('coverage_score', 0), effects_validation.get('coverage_score', 0) ] overall_score = sum(scores) / len(scores)
Collect all issues
all_issues = [] all_issues.extend(desc_validation.get('issues', [])) for precond in precond_validation.get('preconditions_validated', []): if not precond.get('enforced'): all_issues.append({ 'type': 'unenforced_precondition', 'severity': 'medium', 'description': f"Precondition not enforced: {precond['check']}", 'suggestion': precond.get('suggestion', '') }) for effect in effects_validation.get('effects_validated', []): if not effect.get('implemented'): all_issues.append({ 'type': 'unimplemented_effect', 'severity': 'high', 'description': f"Effect not implemented: {effect['effect']}", 'suggestion': 'Implement this effect or remove from manifest' })
Generate report
report = { 'resource': 'RESOURCE_NAME_PLACEHOLDER', 'validated_at': datetime.utcnow().isoformat() + 'Z', 'overall_score': round(overall_score, 3), 'scores': { 'description_accuracy': desc_validation.get('accuracy_score', 0), 'precondition_coverage': precond_validation.get('coverage_score', 0), 'effect_coverage': effects_validation.get('coverage_score', 0) }, 'validation_results': { 'description': desc_validation, 'preconditions': precond_validation, 'effects': effects_validation, 'api': api_validation }, 'issues': all_issues, 'issue_count': len(all_issues), 'passed': overall_score >= 0.8 and len([i for i in all_issues if i['severity'] == 'high']) == 0 } with open('/tmp/validation-report.json', 'w') as f: json.dump(report, f, indent=2)
Print summary
- print(f"\nValidation Score: {overall_score:.2f}")
- print(f"Issues Found: {len(all_issues)}")
- print(f"Status: {'✅ PASSED' if report['passed'] else '❌ FAILED'}")
- PYTHON_SCRIPT
- Validation Criteria
- Scoring Rules
- // Overall score is average of component scores
- overallScore
- =
- (
- descriptionAccuracy
- +
- preconditionCoverage
- +
- effectCoverage
- )
- /
- 3
- // Pass criteria
- passed
- =
- overallScore
- >=
- 0.8
- &&
- highSeverityIssues
- .
- length
- ===
- 0
- Severity Levels
- High
-
- Missing core functionality, unenforced preconditions, unimplemented effects
- Medium
-
- Incomplete features, poor error handling, undocumented exports
- Low
- Minor inconsistencies, documentation gaps, style issues Example Output { "resource" : "rag-implementer" , "validated_at" : "2025-10-28T12:00:00Z" , "overall_score" : 0.85 , "scores" : { "description_accuracy" : 0.9 , "precondition_coverage" : 0.8 , "effect_coverage" : 0.85 } , "validation_results" : { "description" : { "description_accurate" : true , "missing_features" : [ ] , "undocumented_features" : [ "vector_index_optimization" ] , "accuracy_score" : 0.9 , "issues" : [ { "type" : "undocumented_feature" , "severity" : "low" , "description" : "Implementation includes vector optimization not mentioned in manifest" , "suggestion" : "Add 'optimizes_vector_queries' to effects" } ] } , "preconditions" : { "preconditions_validated" : [ { "check" : "file_exists('package.json')" , "enforced" : true , "location" : "validateProject:12" , "error_handling" : "good" } , { "check" : "env_var_set('OPENAI_API_KEY')" , "enforced" : true , "location" : "setupEmbeddings:45" , "error_handling" : "good" } ] , "coverage_score" : 0.8 } , "effects" : { "effects_validated" : [ { "effect" : "creates_vector_index" , "implemented" : true , "location" : "createIndex:120" , "conditional" : false , "confidence" : 0.95 } , { "effect" : "adds_embedding_pipeline" , "implemented" : true , "location" : "setupPipeline:85" , "conditional" : false , "confidence" : 0.9 } ] , "missing_effects" : [ ] , "extra_effects" : [ "optimizes_vector_queries" ] , "coverage_score" : 0.85 } } , "issues" : [ { "type" : "undocumented_feature" , "severity" : "low" , "description" : "Implementation includes vector optimization not mentioned in manifest" , "suggestion" : "Add 'optimizes_vector_queries' to effects" } ] , "issue_count" : 1 , "passed" : true } Integration With manifest-generator Validates that generated manifests are accurate by comparing with implementation. With capability-graph-builder Ensures graph relationships are based on accurate capability descriptions. CI/CD Pipeline
.github/workflows/validate-skills.yml
name : Validate Skills on : [ push , pull_request ] jobs : validate : runs-on : ubuntu - latest steps : - uses : actions/checkout@v3 - name : Validate all skills run : | for skill in SKILLS/*/; do bash SKILLS/skill-validator/validate.sh "$skill" done Success Metrics ✅ All skills score >= 0.8 ✅ No high severity issues in production skills ✅ 100% of preconditions enforced ✅ 95%+ of effects implemented ✅ API surface matches manifest